Goto

Collaborating Authors

 spoken digit recognition application


Spoken digit recognition application on cAInvas

#artificialintelligence

The audio dataset used here is a subset of the Tensorflow speech commands dataset. Each sample is 1-second long mono audio recorded at 8000 Hz. The dataset is a balanced one with 2360 samples in each class. There are many ways to represent audio data, like, waveform, MFCCs, Mel spectrograms, spectrograms and many more. Among them all, the Mel scale is a closer representation of the human audio perception than the standard scale.